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ABSTRACT 

We provide algorithms computing power series solutions of 
a large class of differential or g-differential equations or sys- 
tems. Their number of arithmetic operations grows linearly 
with the precision, up to logarithmic terms. 



1. INTRODUCTION 

Truncated power series are a fundamental class of objects 
of computer algebra. Fast algorithms are known for a large 
number of operations starting from addition, derivative, in- 
tegral and product and extending to quotient, powering and 
several more. The main open problem is composition: given 
two power series / and g, with g(0) = 0, known mod as , 
the best known algorithm computing f(g) mod x N has a 
cost which is roughly that of vTV products in precision TV; 
it is not known whether quasi-linear (i.e., linear up to log- 
arithmic factors) complexity is possible in general. Better 
results are known over finite fields [4, 25] or when more in- 
formation on / or g is available. Quasi-linear complexity 
has been reached when g is a polynomial [11], an algebraic 
series [19] , or belongs to a large class containing for instance 
the expansions of exp(:r) — 1 and log(l + x) [8]. 

One motivation for this work is to deal with the case 
when / is the solution of a given differential equation. Us- 
ing the chain rule, a differential equation for f(g) can be 
derived, with coefficients that are power series. We focus on 
the case when this equation is linear, since in many cases lin- 
earization is possible [5]. When the order n of the equation 
is larger than 1, we use the classical technique of convert- 
ing it into a first-order equation over vectors, so we consider 
equations of the form 



x k 5(F) = AF + C, 



(1) 
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where A is an n x n matrix over the power series ring K[[aj]] 
(K being the field of coefficients), C and the unknown F 
are size n vectors over K[[as]] and for the moment 8 denotes 
the differential operator d/dx. The exponent k in (1) is a 
non-negative integer that plays a key role for this equation. 

By solving such equations, we mean computing a vector F 
of power series such that (1) holds modulo x . For this, we 
need only compute F polynomial of degree less than TV + 1 
(when k = 0) or TV (otherwise). Conversely, when (1) has a 
power series solution, its first TV coefficients can be computed 
by solving (1) modulo x N (when k ^ 0) or a;^^ 1 (otherwise). 

If k = and the field K has characteristic 0, then a for- 
mal Cauchy theorem holds and (1) has a unique vector of 
power series solution for any given initial condition. In this 
situation, algorithms are known that compute the first TV 
coefficients of the solution in quasi-linear complexity [5] . In 
this article, we extend the results of [5] in three directions: 

Singularities. We deal with the case when k is positive. A 
typical example is the computation of the composition F = 
f(g) when / is Gauss' 2F1 hypergeometric series. Although 
/ is a very nice power series 
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we exploit this structure indirectly only. We start from the 
differential equation 



x(x - 1)/" + (a: (a + b + 1) - c)f + abf = 



(2) 



and build up and solve the more complicated 

9(9-1) „ g' 2 (g(a + b+l)-c) + (g-g 2 )g" 
^ r H li 



F'+abF = 



in the unknown F, g being given, with <?(0) = 0. Equa- 
tion (2) has a leading term that is divisible by x so that 
Cauchy's theorem does not apply and indeed there does not 
exist a basis of two power series solutions. This behavior is 
inherited by the equation for F, so that the techniques of [5] 
do not apply — this example is actually already mentioned 
in [11], but the issue with the singularity at was not ad- 
dressed there. We show in this article how to overcome this 
singular behavior and obtain a quasi-linear complexity. 

Positive characteristic. Even when k — 0, Cauchy's the- 
orem does not hold in positive characteristic and Eq. (1) 



may fail to have a power series solution (a simple example 
is F' — F). However, such an equation may have a solution 
modulo x . Efficient algorithms finding such a solution are 
useful in conjunction with the Chinese remainder theorem. 
Other motivations for considering algorithms that work in 
positive characteristic come from applications in number- 
theory based cryptology or in combinatorics [7, 8, 10]. 

Our objectives in this respect are to overcome the lack of a 
Cauchy theorem, or of a formal theory of singular equations, 
by giving conditions that ensure the existence of solutions 
at the required precisions. More could probably be said 
regarding the p-adic properties of solutions of such equations 
(as in [6, 27]), but this is not the purpose of this paper. 

Functional Equations. The similarity between algorithms 
for linear differential equations and for linear difference equa- 
tions is nowadays familiar to computer algebraists. We thus 
use the standard technique of introducing a : K[[a;]] — > K[[x]] 
a unitary ring morphism and letting 5 : K[[a;]] — >• K[[x]] de- 
note a cr-derivation, in the sense that 5 is K-linear and that 
for all /, g in K [[*]], we have 

S(fg) = fS(g) + 5(f)a(g). 

These definitions, and the above equality, carry over to ma- 
trices over K[[x]]. Thus, our goal is to solve the following 
generalization of (1): 

x k 5(F) = Aa(F) + C. (3) 

As above, we are interested in computing a vector F of power 
series such that (3) holds mod x . 

One motivation for this generalization comes from cod- 
ing theory. The list-decoding of the folded Reed-Solomon 
codes [18] leads to an equation Q(x, f(x), f(qx)) — where 
Q is a known polynomial. A linearized version of this is of 
the form (3), with a : 4>{x) i-> cj>(qx). In cases of interest we 
have k = 1, and we work over a finite field. 

In view of these applications, we restrict ourselves to the 
following setting: 

8(x) = 1, o : x i — y qx, 
for some gel \ {0}. Then, there are only two possibilities: 

• q — 1 and 5 : f t-> f' (differential case); 

• q 1 and 5 : f n> !{q ^~_[\ x) (q- differential case). 
As a consequence, 5(1) = and for all i > 0, we have 
5(x 1 ) = 'jiX 1-1 with 70 = and 74 = l+q+- ■ ■+q l ~ 1 (i > 0). 
By linearity, given / = £\> fax 1 G K[[x]], 

i>l 

can be computed mod x N in O(N) operations, as can a(f). 
Conversely, assuming that 71 , . . . , 7„ are all non-zero in K, 
given / of degree at most n— 1 in K[x], there exists a unique 
g of degree at most n such that 5(g) — f and go = 0; it is 
given by g = £)o<»<n-i fi/n+i x%+1 and can be computed 
in O(N) operations. We denote it by g = j q f. In particular, 
our condition excludes cases where q is a root of unity of low 
order. 

Notation and complexity model. We adopt the conven- 
tion that uppercase letters denote matrices or vectors while 



lowercase letters denote scalars. The set ofnxm matrices 
over a ring R is denoted ^ n ,m(R)\ when n = m, we write 
,/£n(R)- If / is in K[[x]], its degree i coefficient is written fi\ 
this carries over to matrices. The identity matrix is written 
Id (the size will be obvious from the context). To avoid any 
confusion, the entry (i,j) of a matrix M is denoted M^'-'K 

Our algorithms are sometimes stated with input in K[[x]], 
but it is to be understood that we are given only truncations 
of A and C and only their first N coefficients will be used. 

The costs of our algorithms are measured by the number 
of arithmetic operations in K they use. We let M : N — > N be 
such that for any ring R, polynomials of degree less than n 
in R[x] can be multiplied in M(n) arithmetic operations in 7?. 
We assume that M(n) satisfies the usual assumptions of [17, 
§8.3]; using Fast Fourier Transform, M(n) can be taken in 
0(nlog(n) loglog(n)) [13, 28]. We note cj G (2,3] aconstant 
such that two matrices in ^£ n (R) can be multiplied in 0(n") 
arithmetic operations in 7?. The current best bound is uj < 
2.3727 ([31] following [14, 30]). 

Our algorithms rely on linear algebra techniques; in par- 
ticular, we have to solve several systems of non-homogeneous 
linear equations. For U in ^£ n (J£) and V in „# n ,i(]K), we 
denote by LinSolve((7A = V) a procedure that returns _L if 
there is no solution, or a pair F, K, where F is in ^ nj i(]K) 
and satisfies UF — V, and K £ ^# n> t(K), for some t < n, 
generates the nullspace of U. This can be done in time 
0(n u ). In the pseudo-code, we adopt the convention that if 
a subroutine returns _L, the caller returns _L too (so we do 
not explicitly handle this as a special case). 

Main results. Equation (3) is linear, non-homogeneous in 
the coefficients of F, so our output follows the convention 
mentioned above. We call generators of the solution space 
of Eq. (3) at precision iV either _L (if no solution exists) or a 
pair F, K where F G ^£„ t i(K[x}) and K € ^# ni t(K[x]) with 
t < nN, such that for G G .^„,i(K[x]), with deg(G) < N, 
x k 5(G) = Aa(G) + C mod x N if and only if G can be written 
G — F + KB for some B G ..# ( ,i(K). 

Seeing Eq. (3) as a linear system, one can obtain such an 
output using linear algebra in dimension nN. While this 
solution always works, we give algorithms of much better 
complexity, under some assumptions related to the spec- 
trum Spec^o of the constant coefficient Aq of A. First, we 
simplify our problem: we consider the case k = as a special 
case of the case k = 1. Indeed, the equation 5(F) — Aa(F) + 
C mod x N is equivalent to xd(F) = Pa(F) + Q mod x N+1 , 
with P = xA and Q = xC. Thus, in our results, we only 
distinguish the cases k = 1 and k > 1. 

Definition 1. The matrix Aq has good spectrum at pre- 
cision iV when one of the following holds: 

• k = 1 and Specy4 n(<f Spec A -Ji) = fori < i < N 

• k > 1, Aq is invertible and 

— q — 1, 7i,...,7jv_fc are non-zero, ISpecAol = n 

and Spec Ao C K; 

- q^l and Spec A Q n q i Spec A = for 1 < i < N. 

In the classical case when K has characteristic and q = 1, if 
k — 1, Ao has good spectrum when no two eigenvalues of Ao 
differ by a non-zero integer (this is e.g. the case when Ao = 
0, which is essentially the situation of Cauchy's theorem; 



this is also the case in our 2F1 example whenever cval(g) is 
not an integer, since Spec A) = {0,val(g)(l — c) — 1}). 

These conditions could be slightly relaxed, using gauge 
transformations (see [1, Ch. 2] and [2, 3]). Also, for k > 1 
and q = 1, we could drop the assumption that the eigenval- 
ues are in K, by replacing K by a suitable finite extension, 
but then our complexity estimates would only hold in terms 
of number of operations in this extension. 

As in the non-singular case [5], we develop two approaches. 
The first one is a divide-and-conquer method. The problem 
is first solved at precision N/2 and then the computation 
at precision N is completed by solving another problem of 
the same type at precision N/2. This leads us to the fol- 
lowing result, proved in Section 2 (see also that section for 
comparison to previous work). In all our cost estimates, we 
consider k constant, so it is absorbed in the big-Os. 

Theorem 1. Algorithm 2 computes generators of the so- 
lution space of Eq. (3) at precision N by a divide- and- conquer 
approach. Assuming Ao has good spectrum at precision N , 
it performs in time 0(n"M(iV) log(iV)). When either k > 1 
or k — 1 and q r Ao — 7 Jd is invertible for < i < N, this 
drops to 0{n 2 M(N) log(AT) + n"N). 

Our second algorithm behaves better with respect to N, 
with cost in 0(M(N)) only, but it always involves polyno- 
mial matrix multiplications. Since in many cases the divide- 
and-conquer approach avoids these multiplications, the sec- 
ond algorithm becomes preferable for rather large precisions. 

In the differential case, when k — and the characteristic 
is 0, the algorithms in [5, 11] compute an invertible ma- 
trix of power series solution of the homogeneous equation 
by a Newton iteration and then recover the solution using 
variation of the constant. In the more general context we 
are considering here, such a matrix does not exist. How- 
ever, it turns out that an associated equation that can be 
derived from (3) admits such a solution. Section 3 describes 
a variant of Newton's iteration to solve it and obtains the 
following. 

Theorem 2. Assuming Ao has good spectrum at preci- 
sion N , one can compute generators of the solution space 
of Eq. (3) at precision N by a Newton-like iteration in time 
0(n"M(N) +n" log(n)iV). 

To the best of our knowledge, this is the first time such a low 
complexity is reached for this problem. Without the good 
spectrum assumption, however, we cannot guarantee that 
this algorithm succeeds, let alone control its complexity. 

2. DIVIDE-AND-CONQUER 

The classical approach to solving (3) is to proceed term- 
by-term by coefficient extraction. Indeed, we can rewrite 
the coefficient of degree i in this equation as 

R l F l = Aj, (4) 

where A, is a vector that can be computed from A, C and 
all previous Fj (and whose actual expression depends on k), 
and Ri is as follows: 

Rr = {q l A - 7i ld) iffc = l 
R i = q i A iffc>l. 

Ideally, we wish that each such system determines Fi uniquely 
that is, that Ri be a unit. For k = 1, this is the case when 



Algorithm 1: Recursive Divide-and-Conquer 

RDACW,C,i, N,k) 

input : A G .# n (K[[x]]), C G % u 

i e n,N e N\ {0},fc g N\ {0} 

output: F G , c £i+N 
if N = 1 then 

if k = 1 then R4 := q i A - 7 jld else R t := q i A 

if det(i?i) = then return Fi 

else return — R~ Co 
else 

m := \N/2] 

H := RDAC(A,C,i,m,fe) 

D := (C - x k S(K) + (q'A ~ 7l a; fe - 1 ld)cr(H)) div x 
K := RDAC( J 4, D, i + m, N - m, k) 
return H + x m K 
end 



i is not a root of the indicial equation det(gMo — 7 Jd) = 0. 
For k > 1, either this is the case for all i (when Ao is in- 
vertible) or for no i. In any case, we let 7Z be the set of 
indices i G {0, . . . , N — 1} such that det(Ri) = 0; we write 
K = {ji <■■■ < >}, so that r=\TZ\. 

Even when 1Z is empty, so the solution is unique, this 
approach takes quadratic time in N, as computing each in- 
dividual A, takes linear time in i. To achieve quasi-linear 
time, we split the resolution of Eq. (3) mod x N into two half- 
sized instances of the problem; at the leaves of the recursion 
tree, we end up having to solve the same Eq. (4). 

When 1Z is empty, the algorithm is simple to state (and 
the cost analysis simplifies; see the comments at the end of 
this section). Otherwise, technicalities arise. We treat the 
cases i G 1Z separately, by adding placeholder parameters for 
all corresponding coefficients of F (this idea is already in [2, 
3] ; the algorithms in these references use a finer classification 
when k > 1, by means of a suitable extension of the notion 
of indicial polynomial, but take quadratic time in N). 

Let fi,!, . . . , f n:r be nr new indeterminates over K (below, 
all boldface letters denote expressions involving these formal 
parameters). For p = 1, . . . , r, we define the vector F Jp with 
entries fi jP) . . . , f n . p and we denote by J£ the set of all vectors 

F = ip + fiF J1 H h prF^, 

with tpo in ^ ni i(K[x]) and each ipi in ^ n (K[x]) for 1 < £ < 
r. We also define J?fi the subspace of vectors of the form 

F = ipo + ip{F h H h W(i) F M(0' 

where /i(i) is defined as the index of the largest element 
je G 1Z such that je < i; if no such element exist (for instance 
when i = 0), we let fj,(i) — 0. A specialization S : Jzf — > 
^Ci,i(IK[x]) is simply an evaluation map defined by f,^ h-> 
fi : e for all i, I, for some choice of (fi,e) in K nr . 

We extend 5 and a to such vectors, by letting S(fi t e) = 
and o-(fi t e) = f^ for all i,£, so that we have, for F in Jf 

5(F) = 8(<po) + %i)F« + ■ ■ ■ + S(<pr)V jr , 

and similarly for c(F). 

The main divide-and-conquer algorithm first computes F 
in I£ , by simply skipping all equations corresponding to in- 
dices i G 7Z; it is presented in Algorithm 2. In a second step, 
we resolve the indeterminacies by plain linear algebra. For 



i > 0, and F, C in J£ , we write 

E(F,C,i) = x k S(F)- ({q 1 A-'y 1 x k - 1 \d)a{F) + C). 

In particular, E(F, C, 0) is a parameterized form of Eq. (3). 
The key to the divide-and-conquer approach is to write H = 
F modi" 1 , K = F div x m and D = (C-E(H, C, i)) div x m . 
Using the equalities 

x k 5(F) = x k 5{H) + x m+k 5(K) + 7m s m+fe -V(K) 

and 7i+ m = 7™, + g'™7i , a quick computation shows that 

E(F,C,i) = (_E(H,C,i) modi m )+a; ra £(K,D,i+m). (5) 

Lemma 1. Let A be in ^ n (¥L[x]) and C in _S%, and Ze£ 
F = RDAC(A, C, i, M, k) with i + M < N. Then: 

1. F is in J£i+M; 

2. for j G {0, . . . ,M — 1} such that i + j £ TZ, the equality 
coeff (£(F, C, i),x J ) = holds; 

3. if C and F in ^#„ ; i(K[a;]) with degF < M are such 
that E(F, C, i) — mod x M and there exists a special- 
ization S : Jzfj — > .#,,,1 (K[x]) s«c/i that C = 5(C), 
£/iere exists a specialization S' : JS+m — > ~#n,i(IK[a;]) 
which extends S and such that F = S(F). 

F is computed in time 0({n 2 + rn")M(M) log(M) + n"M). 

Proof. The proof is by induction on M. 
Proof of 1. For M = 1, we distinguish two cases. If i G TZ, 
say i = ji, we return Fj = Fj e . In this case, + 1) = £, so 
our claim holds. If i £ TZ, because Co G the output is 
in _Sf; as well. This proves the case M = 1. 

For M > 1, we assume the claim to hold for all (i,M'), 
with A/' < M. By induction, H G Jz* i+m and K G M+M- 
Thus, D G J£i+m and the conclusion follows. 
PROOF of 2. For M = 1, if i G 7?., the claim is trivially 
satisfied. Otherwise, we have to verify that the constant 
term of E(F, C, i) is zero. In this case, the output F is 
reduced to its constant term Fo, and the constant term of 
E(F, C, i) is (up to sign) _R;Fo + Co = 0, so we are done. 

For M > 1, we assume that the claim holds for all (i, M'), 
with M' < M. Take j in {0, . . . , M - 1}. If j < m, we have 
coeff(£(F,C,i),a; j ) = coeff (£(H, C, i), x J ); since i + j g 7^, 
this coefficient is zero by assumption. If m < j, we have 
coeff(£(F,C,i),2; j ) = coeff (E(K, D, i), x j ~ m ). Now, j+i ^ 
72. implies that (j — m) + (i + wi) ^ 7?., and j — m < M — m, 
so by induction this coefficient is zero as well. 

Proof of 3. For M = 1, if i G TZ, say i = ji, we have 
F = Fj e , whereas F has entries in K; this allows us to define 
S' . When i £ TZ, we have F — S(F), so the claim holds as 
well. Thus, we are done for M = 1. 

For M > 1, we assume our claim for all (i,M') with 
M' < M. Write H = F mod x m , K = F div x m and 
D = (C-x k 6{H)+(q i A-j i x k ~ 1 \d)o-(H)) div x m . Then, (5) 
implies that E(H, G, i) = mod x m and f?(if, D,i + m) 
'.) mod i A/_m . The induction assumption shows that H is a 
specialization of H, say H = S'(H.) for some S' : J/Ci+m — > 
^n,i(lK[a;]) which extends S 1 . In particular, D — S'(D). 
The induction assumption also implies that there exist an 
extension S" : Jd+ m — > ^#„,i(K[x]) of S 1 ', and thus of S, 
such that A' = S"(K). Then F = 5"(F), so we are done. 

For the complexity analysis, the most expensive part of 
the algorithm is the computation of D. At the inner re- 
cursion steps, the bottleneck is the computation of Ar(H), 



Algorithm 2: Divide-and-Conqucr 
DAC(A,C, N,k) 

input : A G i»(I[[i]]),Ce ^( nA (K[[x]]), 

N GN\{0},fc GN\{0} 
output: Generators of the solution space of 

x k S(F) = Aa(F) + C at precision N. 
F := RDAC(A, C, 0, N, k) 

(F has the form tpo + tpiFj^ + ■ ■ ■ + ip r Fj r ) 
T := x k S(F) - Ao{F) - C mod x N 
T-(T^, ieTZ, j = l,...,n) 
$, A := LinSolve(r = 0) 
M := [cp u ...,ip r ] 
return ip + M$, MA 



where H has degree less than M and A can be truncated 
mod x M (the higher degree terms have no influence in the 
subsequent recursive calls). Computing cr(H) takes time 
0(N(n + rn 2 )) and the product is done in time 0((n 2 + 
rn w )M(M)); recursion leads to a factor log(Af). The base 
cases use O(M) matrix inversions of cost 0{n u ) and O(M) 
multiplications, each of which takes time 0{rn u ). □ 

The second step of the algorithm is plain linear algebra: 
we know that the output of the previous algorithm satisfies 
our main equation for all indices i ^ TZ, so we conclude by 
forcing the remaining ones to zero. 

Proposition 1. On input A, C, N, k as specified in Algo- 
rithm 2, this algorithm returns generators of the solution 
space of (3) mod x N in time 0((n' 2 + rn u )M(N)lag(N) + 
r 2 n bJ N + r w n w ). 

Proof. The first claim is a direct consequence of the con- 
struction above, combined with Lemma 1. For the cost es- 
timate, we need to take into account the computation of 
T, the linear system solving, and the final matrix products. 
The computation of T fits in the same cost as that of D 
in Algorithm 1, so no new contribution comes from here. 
Solving the system T — takes time 0((rn) u ). Finally, the 
product [cpi ■ ■ ■ tfir] A involves an n x (rn) matrix with entries 
of degree N and an (rn) x t constant matrix, with t < rn; 
proceeding coefficient by coefficient, and using block matrix 
multiplication in size n, the cost is 0(r 2 n" N). □ 

When all matrices Ri are invertible, the situation becomes 
considerably simpler: r = 0, the solution space has dimen- 
sion 0, there is no need to introduce formal parameters, the 
cost drops to O(n 2 M(A01og(A0 +n"N) for Lemma 1, and 
Proposition 1 becomes irrelevant. 

When Ao has good spectrum at precision N, we may not 
be able to ensure that r — 0, but we always have r < 1. 
Indeed, when k = 1, the good spectrum condition implies 
that for all < i < N and for j G N, the matrices Ri and 
Rj have disjoint spectra so that at most one of them can be 
singular. For k > 1, the good spectrum condition implies 
that all Ri are invertible, whence r — 0. This proves Thm. 1. 

Previous work. As said above, Barkatou and Pfltigel [3], 
then Barkatou, Broughton and Pfltigel [2] , already gave algo- 
rithms that solve such equations term-by-term, introducing 
formal parameters to deal with cases where the matrix Ri 
is singular. These algorithms handle some situations more 
finely than we do (e.g., the cases k > 2), but take quadratic 



time; our algorithm can be seen as a divide-and-conquer 
version of these results. 

In the particular case q 7^ 1, n = 1 and r — 0, another 
forerunner to our approach is Brent and Traub's divide-and- 
conquer algorithm [12]. That algorithm is analyzed for a 
more general cr, of the form a(x) = xq(x), as such, they are 
more costly than ours; when q is constant, we essentially end 
up with the approach presented here. 

Let us finally mention van der Hoeven's paradigm of re- 
laxed algorithms [19, 22, 23], which allows one to solve sys- 
tems such as (3) in a term-by-term fashion, but in quasi- 
linear time. The cornerstone of this approach is fast relaxed 
■multiplication, otherwise known as online multiplication, of 
power series. 

In [19, 20], van der Hoeven offers two relaxed multiplica- 
tion algorithms (the first one being similar to that of [16]); 
both take time 0(M(n) log(n)). When r = 0, this yields 
a complexity similar to Prop. 1 to solve Eq. (3), but it is 
unknown to us how this carries over to arbitrary r. 

When r = 0, both our divide-and-conquer approach and 
the relaxed one can be seen as "fast" versions of quadratic 
time term-by-term extraction algorithms. It should appear 
as no surprise that they are related: as it turns out, at least 
in simple cases (with k — 1 and n = 1), using the relaxed 
multiplication algorithm of [20] to solve Eq. (3) leads to 
doing exactly the same operations as our divide-and-conquer 
method, without any recursive call. We leave the detailed 
analysis of these observations to future work. 

For suitable "nice" base fields (e.g., for fields that sup- 
port Fast Fourier Transform), the relaxed multiplication al- 
gorithm in [19] was improved in [21, 24], by means of a 
reduction of the log(n) overhead. This raises the question 
whether such an improvement is available for divide-and 
conquer techniques. 

3. NEWTON ITERATION 
3.1 Gauge Transformation 

Let F be a solution of Eq. (3). To any invertible ma- 
trix W G ^# n (K[a;]), we can associate the matrix Y — 
W~ 1 F G ./# n (K[[a;]]). We are going to choose W in such a 
way that Y satisfies an equation simpler than (3). The heart 
of our contribution is the efficient computation of such a W. 

Lemma 2. LetW G ^#„(K[a;]) be invertible in ^„(K[[x]}) 
and let B G .#„(K[x]) be such that 

B = W~ 1 (x k 8(W) — Aa(W)) mod x N . (6) 
Then F in ^K n ,i(&[x]) satisfies 

x k 8{F) = Aa(F) + C mod x N (3) 
if and only ifY — W~ 1 F satisfies 

x k S(Y) = Ba(Y) + W' 1 C mod x N . (7) 
Proof. Differentiating the equality F — WY gives 
x k S(F) = x k S(W)a(Y) +x k W8(Y). 
Since x k S{W) = Aa(W) - WB mod x N , we deduce 
x k S(F)-Aa(F)-C = W(x k 5(Y)-Ba(Y)~W' 1 C) mod x N . 
Since W is invertible, the conclusion follows. □ 



Algorithm 3: PolCoeffsDE 



The systems (3) and (7) are called equivalent under the 
gauge transformation Y = WF. Solving (3) is thus reduced 
to finding a simple B such that (7) can be solved efficiently 
and such that the equation 

x k S(W) = Aa{W) - WB mod x N (8) 

that we call associated to (3) has an invertible matrix W 
solution that can be computed efficiently too. 

As a simple example, consider the differential case, with 
k = 1. Under the good spectrum assumption, it is custom- 
ary to choose B = Aq, the constant coefficient of A. In this 
case, the matrix W of the gauge transformation must satisfy 

xW' = AW - WA mod a;*. 

It is straightforward to compute the coefficients of W one 
after the other, as they satisfy Wo ~ Id and, for i > 0, 

(Ao - M)Wi - WiA = -^2 Ai-jWj. 

However, using this formula leads to a quadratic running 
time in N . The Newton iteration presented in this section 
computes W in quasi-linear time. 

3.2 Polynomial Coefficients 

Our approach consists in reducing efficiently the resolution 
of (3) to that of an equivalent equation where the matrix A 
of power series is replaced by a matrix B of polynomials 
of low degree. This is interesting because the latter can be 
solved in linear complexity by extracting coefficients. This 
subsection describes the resolution of the equation 

x k S(Y) = Pa(Y) + Q, (9) 

where P is a polynomial matrix of degree less than k. 

Lemma 3. Suppose that Pq has good spectrum at preci- 
sion N . Then Algorithm 3 computes generators of the solu- 
tion space of Eq. (9) at precision N in time 0(n u N), with 
M e ./# n , t (K) for some t < n. 

PROOF. Extracting the coefficient of x l in Eq. (9) gives 
7i_ fe+ iYi_fe + i = q % PoYi H h q*~ k+1 P k -iYi_ k+1 + Q t . 

In any case, the equation to be solved is as indicated in the 
algorithm. For k — 1, we actually have C = Qi for all i, so 



PolCoeffsDE(P,Q,fc,7V) 

input : P £ ^#„(K[a;]) of degree less than k, 

Q € ^ n ,i(K{[x]}), N G N \ {0}, k G N \ {0} 
output: Generators of the solution space of 

x k S(Y) = Pa(Y) + Q at precision TV. 
for i = 0, . . . , N - 1 do 

C := Qi + (Piq^Yi-t + ■■■ + P k - iq *- k+1 Y- k+1 ) 
if k = 1 then 

Yi,Mi := LinSolve(( 7l ld - q r P )X = C) 
else 

Y h Mi := LinSolve(-YP X = C - t.-hi^-hi) 
end 
end 

return Y + ■ ■ ■ + Y N . lX N - 1 , [M M x x-- ■ M N . x x N ~ x \ 



all these systems are independent. For k > 1, the good spec- 
trum condition ensures that the linear system has full rank 
for all values of i, so all Mi are empty. For each i, comput- 
ing C and solving for Yi is performed in 0(71") operations, 
whence the announced complexity. □ 

3.3 Computing the Associated Equation 

Given A G ./£ n {K\[x\\) , we are looking for a matrix B 
with polynomial entries of degree less than k such that the 
associated equation (8), which does not depend on the non- 
homogeneous term C, has an invertible matrix solution. 

In this article, we content ourselves with a simple version 
of the associated equation where we choose B in such a way 
that (8) has an invertible solution V mod x k ; thus, V and 
B must satisfy Aa(V) = VB mod x k . The invertible matrix 

V is then lifted at higher precision by Newton iteration (Al- 
gorithm 6) under regularity conditions that depend on the 
spectrum of Aq. Other cases can be reduced to this setting 
by the polynomial gauge transformations that are used in 
the computation of formal solutions [2, 33]. 

When k = 1 or q ^= 1, the choice 

B = Amodx k , V~ = ld 

solves our constraints and is sufficient to solve the associated 
equation. When q = 1, k > 1 (in particular when the point 
is an irregular singular point of the equation), this is not be 
the case anymore. In that case, we use a known technique 
called the splitting lemma to prepare our equation. See for 
instance [1, Ch. 3.2] and [2] for details and generalizations. 

Lemma 4 (Splitting Lemma). Suppose that k > 1, 
that |SpecAoj = n and that SpecAo C K. Then one can 
compute in time 0(n") matrices V and B of degree less than 
k in ^/K n {K\x\) such that the following holds: Vo is invertible; 
B is diagonal; AV = VB mod x k . 

Proof. We can assume that Aq is diagonal: if not, we 
let P be in ^ n (K) such that D = P~ l AP has a diagonal 
constant term; we find V using D instead of A, and replace 

V by PV . Computing P and PV takes time 0(n"), since as 
per convention, k is considered constant in the cost analyses. 

Then, we take Bo = Ao and Vo = Id. For i > 0, we have 
to solve AoVi — ViAo — Bi = A;, where A; can be computed 
from Ax, . . . , Ai and B\, . . . , -B;-i in time 0(n"). We set the 
diagonal of Vi to 0. Since Ao is diagonal, the diagonal Bi is 
then equal to the diagonal of Ai, up to sign. Then the entry 
{£,m) in our equation reads (re — r m )V/ f ' m ' = Af' m \ with 
n , . . . , r n the (distinct) eigenvalues of Ao. This can always 
be solved, in a unique way. The total time is 0(n"). □ 

3.4 Solving the Associated Equation 

Once B and V are determined as in §3.3, we compute 
a matrix W that satisfies the associated equation (8); this 
eventually allows us to reduce (3) to an equation with poly- 
nomial coefficients. This computation of W is performed 
efficiently using a suitable version of Newton iteration for 
Eq. (8); it computes a sequence of matrices whose preci- 
sion is roughly doubled at each stage. This is described in 
Algorithm 6; our main result in this section is the following. 

Proposition 2. Suppose that Ao has good spectrum at 
precision N . Then, given a solution of the associated equa- 
tion mod x k , invertible in ^£" n (K[[x]]) , Algorithm 6 com- 
putes a solution of that equation moda;^, also invertible in 
JK n {K[\x}}), in time 0(n u M(N) + n" log(n)iV). 



Algorithm 4: Solving Eq. (10) when k = 1 or q 7^ 1 
DiffSylvester(F, m, N) 

input : T G x m ^ n (K[[x]]),m G N \ {0}, N G N \ {0} 
output: U G x m ~ k Jt n {7£\x\) solution of (10). 
for i = m, . . . , N — 1 do 

C := (Bk,- 1 ^-! + ■ • ■ + B k ^ iq l ' k+1 U t - k+1 ) 

-{Ui-xBx +■■■ + K-fc+iBfc-i) + Ti 
if k = 1 then 

Ui := Sylvester(XB + (7<ld - q'B )X = C) 
else 

Ui := Sylvester(XBo - q l B X = 
C — ^ji-k+iUi-k+i) 

end 
end 

return U m x m + ■■■ + Un-^- 1 



Before proving this result, we show how to solve yet an- 
other type of equations that appear in an intermediate step: 

x k S(U) = Ba(U) -UB + T mod x N , (10) 

where all matrices involved have size n x n, with F = mod 
x m . This is dealt with by Algorithm 4 when k = 1 or q ^ 1 
and Algorithm 5 otherwise. 

For Algorithm 4, remember that B = A mod x k . The al- 
gorithm uses a routine Sylvester solving Sylvester equations. 
Given matrices Y, V, Z in ^ n (K), we are looking for X in 
^f„(K) such that YX — XV = Z. When (Y, V) have disjoint 
spectra, this system admits a unique solution, which can be 
computed 0(n" log(n)) operations in K [26]. 

Lemma 5. Suppose that k — 1 or q 7^ 1 and that Ao has 
good spectrum at precision N . If F = mod x m , with k < 
m < N , then Algorithm 4 computes a solution U to Eq. (10) 
that satisfies (7 = mod x m ~ k+1 in time 0(n" log(n)TV). 

Proof. Extracting the coefficient of x 1 in (10) gives 
'yt-k+iUi-k+i = q'B Ui - UiB + C, 

with C as defined in Algorithm 4. In both cases k = 1 
and k > 1, this gives a Sylvester equation for each Ui, of 
the form given in the algorithm. Since Bo = Ao, the spec- 
trum assumption on Ao implies that these equations all have 
a unique solution. Since F is Omod i m , so is U (so we 
can start the loop at index m). The total running time is 
Oiji" log(n)iV) operations in K. □ 

This approach fails in the differential case (g = 1) when 
k > 1, since then the Sylvester systems are all singular. 
Algorithm 5 deals with this issue, using the fact that in this 
case, B is diagonal, and satisfies the conditions of Lemma 4. 

Lemma 6. Suppose that k > 1, q = 1 and that Ao has 
good spectrum at precision N . If F = mod x m , with k < 
m < N , then Algorithm 5 computes a solution U to Eq. (10) 
that satisfies (7 = mod x m ~ k+1 in time 0(n 2 N). 

Proof. Since B is diagonal, the (i, j)th entry of (10) is 

x k S(U {l - j) ) = (B (M) - B a ' j) )U (id) +r (lj) mod x N . 

When i = j, B (l,l) — B^'^ vanishes. After dividing by x k , 
we simply have to compute an integral, which is feasible 



Algorithm 5: Solving Eq. (10) when k > 1 and q — 1 
DiffSylvesterDifferential (r, m, AO 

input : T G x m ^ n (K[[x]}),m G N \ {0}, N G N \ {0} 
output: (7 G a; m ~ fe .^„(IC[s]) solution of (10). 
for i = 1, . . . , n do 
for j = 1, . . . , n do 

if i = j then f/ (M) := x k J (ar*T (i,i) ) mod 

else 

| U {x ' j) -- PolCoeffsDE(B (M) - k, N) 

end 
end 
end 

return U 



Algorithm 6: Newton iteration for Eq. (8) 
NewtonAE(V,AO 

input : V G ./€ n {K[x\) solution of (8) modx* 
invertible in _# n (K[[x]]), N G N \ {0} 
output: W G ^# n (K[a;]) solution of (8) modx" 

invertible in ^ n (K[[x]]), with W = V mod x k 
if N < k then return V 
else 

m:= r^=il 
H := NewtonAE(^, m) 
R := x k 8(H) - Aa{H) + HB 
if k = 1 or q 7^ 1 then 
I U := DiffSylvester(-//~ 1 R, m, AT) 
else 

I U := DiffSylvesterDifferential(-i/ _1 i?, m, N) 
end 

return H + HU 
end 



under the good spectrum assumption (we have to divide by 
the non-zero 71 = 1, . . . , jN-k = N — k). When i 7^ j, the 
conditions ensure that Lemma 3 applies (and since k > 1, 
the solution is unique, as pointed out in its proof). □ 

We now prove the correctness of Algorithm 6 for Newton 
iteration. Instead of doubling the precision at each step, 
there is a slight loss of k — 1. 

Lemma 7. Letm > k and let H G ^#„(K[a;]) be invertible 
in ^ n (K[[a;]]) and satisfy (8) moAx m . Let N be such that 
m < N < 2m — k + 1. Let R and U be as in Algorithm 6 
and suppose that A$ has good spectrum at precision N . 

Then H + HU is invertible in ^ n (¥L[[x]]) and satisfies the 
associated equation modx N . Given H, U can be computed 
in time 0{n"M(N) + n u \og(n)N). 

PROOF. By hypothesis, R = mod x m . Then 

x k S(H + HU) - Aa(H + HU) + (H + HU)B 

= {x k S(H) - Aa(H) + HB)(\d + a(U)) 
+ H(x k 8(U) + UB- BoOJ)) 

= 7?(ld + a(U)) - R mod x N = Ra(U) mod x N . 

Using either Lemma 5 or Lemma 6, U = Omod i ra ~ t+1 , 
so o~{U) = mod x m ~ k+1 . Thus, the latter expression is 0, 
since 2m - k + 1 > N. Finally, since HU = mod x m ~ k+1 , 



and m > k, H + HU remains invertible in ,/# n (K[[a;]]). The 
various matrix products and inversions take a total number 
of 0(n u 'M(N)) operations in K (using Newton iteration to 
invert H). Adding the cost of Lemma 5, resp. Lemma 6, we 
get the announced complexity. □ 

We can now prove Proposition 2. Correctness is obvious 
by repeated applications of the previous lemma. The cost 
C(N) of the computation up to precision satisfies 

C(N) = C{m) + O(n"M(A0 + n" \ognN), N > k. 

Using the super-additivity properties of the function M as 
in [17, Ch. 9], we obtain the claimed complexity. 

We can now conclude the proof of Thm. 2. In order to 
solve Equation (3), we first determine B and V as in §3.3; 
the cost will be negligible. Then, we use Proposition 2 to 
compute a matrix W that satisfies (8) modx N . Given C in 
^d,i(IK[[:r]]), we next compute F = W~ 1 C mod as . By the 
previous lemma, we conclude by solving 

x k 5(Y) = Ba(Y) + T mod x N . 

Lemma 3 gives us generators of the solution space of this 
equation moda;^. If it is inconsistent, we infer that Eq. (3) 
is. Else, from the generators (Y, M) obtained in Lemma 3, 
we deduce that (WY, WM) mod x N is a generator of the 
solution space of Eq. (3) modx^. Since the matrix M has 
few columns (at most n), the cost of all these computations 
is dominated by that of Proposition 2, as reported in Thm. 2. 

4. IMPLEMENTATION 

We implemented the divide-and-conquer and Newton it- 
eration algorithms, as well as a quadratic time algorithm, 
on top of NTL 5.5.2 [29]. In our experiments, the base field 
is K = Z/pZ, with p a 28 bit prime; the systems were drawn 
at random. Timings are in seconds, averaged over 50 runs; 
they are obtained on a single core of a 2 GHz Intel Core 2. 

Our implementation uses NTL's built-in zz_pX polyno- 
mial arithmetic, that is, works with "small" prime fields (of 
size about 2 30 over 32 bit machines, and 2 50 over 64 bits 
machines). For this data type, NTL's polynomial arithmetic 
uses a combination of naive, Karatsuba and FFT arithmetic. 

There is no built-in NTL type for polynomial matrices, 
but a simple mechanism to write one. Our polynomial ma- 
trix product is naive, of cubic cost. For small sizes such 
as n = 2 or n = 3, this is sufficient; for larger n, one 
should employ improved schemes (such as Waksman's [32], 
see also [15]) or evaluation- interpolation techniques [9]. 

Our implementation follows the descriptions given above, 
up to a few optimizations for algorithm NewtonAE (which 
are all classical in the context of Newton iteration). For 
instance, the inverse of H should not be recomputed at every 
step, but simply updated; some products can be computed 
at a lower precision than it appears (such as H _1 R, where 
R is known to have a high valuation). 

In Fig. 1, we give timings for the scalar case, with k = 1 
and g 7: 1. Clearly, the quadratic algorithm is outperformed 
for almost all values of A^; Newton iteration performs better 
than the divide-and-conquer approach, and both display a 
subquadratic behavior. Fig. 2 gives timings when n varies, 
taking k = 1 and g / 1 as before. For larger values of n, 
the divide-and-conquer approach become much better for 
this range of values of N, since it avoids costly polynomial 
matrix multiplication (see Thm. 1). 




Figure 2: Timings with k = 1, q 1 



Finally, Table 1 gives timings obtained for k — 3, for 
larger values of n (in this plot of the results would 

be less readable, due to the large gap between the divide- 
and-conquer approach and Newton iteration, in favor of the 
former); DAC stands for "divide-and-conquer". In all cases, 
the experimental results confirm to a very good extent the 
theoretical cost analyses. 
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